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ABSTRACT 

Genome-scale engineering of living organisms 
requires precise and economical methods to effi- 
ciently modify many loci within chromosomes. One 
such example is the directed integration of chem- 
ically synthesized single-stranded deoxyribonucleic 
acid (oligonucleotides) into the chromosome of 
Escherichia coli during replication. Herein, we 
present a general co-selection strategy in multiplex 
genome engineering that yields highly modified cells. 
We demonstrate that disparate sites throughout the 
genome can be easily modified simultaneously by 
leveraging selectable markers within 500 kb of the 
target sites. We apply this technique to the modifica- 
tion of 80 sites in the E. coli genome. 

INTRODUCTION 

New capabilities that enhance the engineering of organisms 
at the whole-genome scale provide avenues to construct 
biological systems with new properties. Such engineering 
can produce minimized (1) or mosaic (2) genomes, or ones 
that may contain new genes, pathways, metabolisms and 
fundamentally different regulatory structure (3). However, 
these projects require significant time and resources to 
accomplish by traditional genetic engineering or lab-scale 



evolution approaches. Prior work demonstrated that 
targeted chromosomal modifications could be efficiently 
introduced in Escherichia coli using synthetic oligonucleo- 
tides (oligos) complementary to the lagging strand of the 
replicating chromosome (4-6), which we refer to as 
oligo-mediated 1-Red allelic replacement (AR). This type 
of method has been used in other prokaryotic and eukary- 
otic systems (7-9). Such an approach provides several 
advantages: AR is an extremely general mechanism; 
user-defined oligos can target anywhere in the chromosome 
without a need for site-specific nucleases; no antibiotic or 
other functional selection is necessary and the mutagenesis 
process leaves no sequence-based 'scars' in the genome. 
Furthermore, oligo-mediated genome engineering can also 
be multiplexed and automated (10). 

We recently described multiplex automated genome 
engineering (MAGE), using AR to combinatorially 
modify 24 targeted sites throughout the genome, to 
rapidly increase the output of a metabolic pathway (10). 
We have also applied MAGE to the modification of 
hundreds of genome sites of E. coli MG1655 in pursuit 
of a re-engineered genetic code (11). Herein, we present a 
general co-selection (CoS) strategy based on MAGE to 
isolate highly modified cells with many chromosomal 
modifications. We demonstrate that one or more select- 
able genetic markers (within ~500kb of the targeted sites) 
can be used to obtain as many as eight targeted modifica- 
tions in a single MAGE cycle. We further iterate these 
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cycles to accumulate many more modifications over a 
1.1Mb span of the E. coli chromosome. This type of 
CoS strategy can also be applied to incorporate multiple 
new regulatory elements into the chromosome (12). 

MATERIALS AND METHODS 

Multiplex automated genome engineering 

MAGE uses single-stranded oligos to modify the 
deoxyribonucleic acid (DNA) sequence of many different 
chromosomal sites in vivo, frequently applying multiple 
oligos over several iterations (10). A singleplex, single cycle 
of MAGE is comparable with the oligo recombineering 
technique of Court and co-workers (4,6,13). 

Allele switching and selection 

MAGE was used to switch gene functions off and on 
in vivo, using CoS oligos that either introduce (and 
later remove) a stop codon or remove (and later 
re-introduce) the translation start site. In multiplex experi- 
ments, dilute quantities of these CoS oligos were used, 
typically at 1% (each) of the total oligo concentration. 
For example, we find that a 10-plex cycle of MAGE is 
maximized when using 0.5 uM of each oligo (5 uM total) 
and 0.05 uM of each CoS oligo. The selectable genes used 
in this study are kan, bla, cat and tolC. The lacZ gene 
was also used as a means to quantify and screen for 
modified cells. 

Media, chemicals and reagents 

Liquid cultures of all strains were grown in LB-Lennox 
media (referred to hereafter as LB) containing tryptone 
(10g/l), yeast extract (5g/l) and NaCl (5 g/1) and 
buffered to pH 7.45 with NaOH. Chloramphenicol (cat), 
kanamycin (kan) or carbenicillin (carb) were added to LB 
cultures or LB-agar plates (LB with 15 g agar/L) at con- 
centrations of 20 ug/ml, 30 ug/ml or 50 ug/ml, respectively. 
X-Gal (40 ug/ml) and isopropyl P-D-l-thiogalacto- 
pyranoside (0.1 mM) were used on LB agar plates for 
functional assay of P-galactosidase activity. Multiplex 
polymerase chain reaction (PCR) kits were purchased 
from Qiagen (Cat no. 206143). Standard agarose gel elec- 
trophoresis reagents were used. Colicin El was expressed 
in strain JC41 1 and purified as described by Schwartz and 
Helinski (14). 

Primers and other oligos 

All oligos were obtained from Integrated DNA 
Technologies with no additional purification. Oligos for 
AR contained either two phosphorothioate linkages at 
the 3' and 5' terminus or four phosphorothioate linkages 
at the 5' terminus unless designated otherwise. We have 
found that protection of the oligo with at minimum two 
phosphorothioate linkages at the 5' terminus improves 
MAGE efficiency by a factor of 2 (10). 

Escherichia coli strains 

Construction of EcNRl, EcNR2 and EcFI5 strains was 
previously documented (10). In brief, our 1-Red construct 



(including the ampicillin resistance gene bla) derived from 
a defective tempera ture-inducible X was introduced by PI 
transduction to E. coli MG1655 to produce EcNRl 
(Abio • A:\X-Red-bld). EcNR2 is an EcNRl derivative with 
AmutSv.cat. EcFI5 is an EcNR2 derivative with 
AgalKwkan and made kanamycin and chloramphenicol 
sensitive by inactivating the kan and cat gene using two 
oligos kan_off and cat_off that introduced a nonsense 
mutation in each gene to create the genotype 
AgalK::kan(-) and AmutS::cat(-). EcBS2 is derived from 
EcFI5 using oligo bla_off to deactivate the bla gene. 
EcBS3 is derived from EcFI5 using oligo tolC _off to de- 
activate the tolC gene. EcBS5 was derived from EcNRl by 

(i) switching the mutS gene into an inactive state using 
MAGE and oligo mutSjoff (Supplementary Table S4); 

(ii) deleting the endogenous tolC gene and (hi) inserting 
the tolC gene within the region to be modified, between 
the yohC and yohD genes. (Experiments using other 
strains than EcBS5 used the endogenous tolC marker.) 
Colicin El expression strain JC411 was obtained from 
Roberto Kolter. 

Oligo-mediated AR 

Allele-replacement-competent cells were generated as 
described previously (10). In brief, individual colonies 
from a freshly streaked overnight plate were inoculated 
into 3 ml LB aliquots and grown in a rotator drum at 
300 rpm at 32°C. On reaching OD 60 o of 0.7, the glass 
tubes were moved to a 42°C shaking water bath for 
15min to induce the expression of X-Red proteins. Cells 
are then immediately chilled on ice for at least 5 min and 
subsequently made electrocompetent in 1 ml aliquots by 
twice pelleting and resuspending in cold-sterile dH 2 0. 
Cells are finally concentrated 20-fold into 50 dH 2 0 
containing oligos. This 50 p.1 volume is electroporated 
with a BioRad GenePulser (set to 1800 V, 25 uF, 200 £2) 
using a 1-mm gap cuvette. Electroporated reactions are 
immediately added to 3 ml of warm LB media and re- 
covered for at least 3 h (to allow segregation of modified 
alleles and division into clonal daughter cells) before 
plating on LB-agar. A typical multiplex MAGE recom- 
bination would use a total oligo concentration of 5 uM 
(i.e. 250 picomoles in the 50 electroporation volume). 
Thus when using 10 non-selectable oligos, these would be 
present at concentrations of 0.5 uM each (25 picomoles 
each in the 50 ul) and when using CoS, 0.05 uM of each 
CoS oligo was used (2.5 picomoles each in the 50 ul, i.e. 
approximately 1% of the total oligo concentration). 

Multiplex allele-specific colony PCR 

Multiplex allele-specific colony PCR (MASC-PCR) as 
described previously (11,15) was used to genotype sets of 
clones to estimate allele replacement (AR) frequencies and 
assess the distribution of modifications among groups of 
clones. In short, two sets of primers were synthesized for 
each genomic locus, one corresponding to the mutant 
allele and one corresponding to the wild-type (WT) 
allele. The forward primers were identical except at the 
3' terminal nucleotide that corresponded to the specific 
sequence of either the mutant or WT allele. The reverse 
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primer was the same for both alleles. Primers were 
designed for a target T m of 62°C. To query the genotype 
in a clone, the mutant and WT allele primer sets were run 
in separate colony PCR reactions. A clone containing the 
mutant allele will generate PCR products only using the 
mutant allele primers and not the WT primers and vice 
versa for a clone with the WT allele. Non-specific ampli- 
fication of both mutant and WT primers were observed 
when a suboptimal annealing temperature was used. A 
gradient PCR was thus done to experimentally determine 
the optimal annealing temperature, which tended to vary 
from 62°C to 67°C depending on the primer sequence. 
Multiple loci were queried in a single PCR reaction 
using the multiplex PCR kit from Qiagen and pooled 
primer sets that produced amplicons of length 100, 150, 
200, 250, 300, 400, 500, 600, 700 and 850 bp, correspond- 
ing to up to 10 different genomic loci. In each 20 ul PCR 
reaction, 1 ul of a 1 in 100 dilution in water of a saturated 
clonal culture (i.e. produced from a colony one wishes to 
assay) generated the best MASC-PCR specificity. PCR 
cycles were heat activation and cell lysis for 15min at 
95°C, denaturing for 30 s at 94°C, annealing for 30 s at 
the optimized annealing temperature, extension for 80s 
at 72°C, repeated cycling for 26 times and final extension 
for 5min at 72°C. Gel electrophoresis on a 1.5% agarose 
gel produced the best separation for a 10-plex 
MASC-PCR reaction. 

Multiplex allele-specific colony quantitative PCR 

To complement the MASC-PCR analyses, we also used a 
highly multiplexed quantitative PCR (qPCR) screen to 
rapidly identify clones that contained the highest degree 
of modification. This technique is reported in full detail 
elsewhere (11). Two qPCR reactions were compared for 
each clone evaluated, one with 10 or more pairs of primers 
matched to the unmodified TAG genes and the other with 
the same number of primer pairs matched to the intended 
TAA modifications. The TAG reactions were expected to 
proceed most efficiently with a WT template and the TAA 
reactions most efficiently with a fully modified template. 
Intermediate values between these extremes also provided 
an effective, though non-linear gauge of the extent of 
modification for each clone. 

Each colony was used as template for a pair of qPCR 
reactions comparing the amplification efficiency when 
matched to primers terminating in WT or targeted 
mutant sequence. The experimental measurement for a 
given clone is then compared with the equivalent values 
measured for the unmodified starting (negative control) 
strain. This reference value is subtracted from each AC q 
to yield a AAC q , with unmodified clones scoring close to 
zero (as with the negative control colonies). The largest 
AACq values were expected to indicate the most modified 
clones, which we confirmed by genotyping clones with 
varying AAC q values. Large numbers of clones could be 
quickly assessed with this approach (up to 191 per 
384-well plate, plus a negative control). A typical assess- 
ment of MAGE-cycled clones partitioned a 384-well plate 
into 4 groups of 96 wells. Each group of 96 wells was thus 
used to assay 48 colonies (44-46 queried colonies plus 2-4 



control colonies) at 10 loci. After identification of the most 
promising clones, site-specific qPCR genotyping was used 
to identify which specific sites had been modified, selecting 
the best clones for further modification. 

Individual bacterial colonies were picked from LB-agar 
plates by touching a 20 ul pipette tip to a colony and 
suspending this small amount of cells in 0.5 ml sterile dis- 
tilled deionized water; 5 ul of this suspension was used as 
template in 20 ul qPCR reactions containing 1 x NovaTaq 
buffer, 0.5 U NovaTaq Hotstart DNA Polymerase (EMD 
Biosciences), 250 uM each deoxynucleotide triphosphate 
(dNTP), 0.5 x SYBR Green I (Invitrogen) and 5% 
dimethyl sulfoxide (DMSO). Primer concentrations were 
50 nM for each of 10 primer pairs, i.e. 500 nM total 
forward primers and 500 nM total reverse primers. A 
typical qPCR program included a lOmin hot start at 
95°C, followed by 40 cycles (95° C for 30 s, 60°C for 30 s 
and 72°C for 30 s) finishing with a melt curve analysis. All 
reactions were performed in a 7900 HT system (Applied 
Biosystems). PCR primers for all sites were designed to 
have a melting temperature estimated at 62° C. Reverse 
primers were chosen to yield amplicons in the size range of 
200-225 bp. No optimization was needed for qPCR primer 
sequences or for multiplex/singleplex reaction conditions. 

Allele frequency qPCR to determine AR frequencies 

Allele frequency qPCR (AF-qPCR) was used to measure 
AR frequencies throughout larger cell populations 
produced by MAGE. This technique queries the ensemble 
population of a cell culture instead of individual clones. The 
method of Germer et al. (16) was modified to more accur- 
ately quantify extreme high (>90%) and low (<10%) 
frequencies (P.A.C., B.S. and J.M.J, in preparation). As 
with MASC-PCR earlier, two pairs of primers are used, 
matching either WT or mutant sequences to discriminate 
between alleles. 

AF-qPCR templates were either homogeneous cultures 
grown in LB (positive and negative controls) or heteroge- 
neous mixtures variable for two alleles, most frequently the 
TAG WT and TAA mutant stop codons described 
elsewhere (11). Each culture was used as template for a 
pair of PCR reactions comparing the amplification fre- 
quency when matched to primers terminating in WT or 
targeted mutant sequence. The difference in these two 
frequencies (in amplification threshold cycle, AC q ) for a 
control reaction defines the lower and upper (0% and 
100%) limits of the measurement. The experimental meas- 
urement for a mixed culture is then compared with these 
reference values to calculate a percentage representation for 
each allele in the pool. The AF method of Germer et al. (16) 
was used as a starting point, with refinements to the 
calculation to more accurately determine high frequencies 
(90-99%) and low frequencies (1-10%); 5 ul of cell culture 
(typically diluted 1:100 or 1:1000 into dH 2 0) was used as 
template in 20 ul qPCR reactions containing 1 x NovaTaq 
buffer, 0.5 U NovaTaq Hotstart DNA Polymerase (EMD 
Biosciences), 250 uM each dNTP, 0.5 x SYBR Green I 
(Invitrogen) and 5% DMSO. Primer concentrations were 
500 nM each. A typical qPCR program included a lOmin 
hot start at 95°C, followed by 40 cycles (95°C for 30 s, 60°C 
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for 30 s and 72°C for 30 s) finishing with a melt curve 
analysis. All reactions were performed in a 7900 HT 
system (Applied Biosystems). PCR primers for all sites 
were designed to have a melting temperature estimated at 
62° C. Reverse primers were chosen to yield amplicons in 
the size range of 200-225 bp. 

CoS singleplex experiments 

Escherichia coli strain EcBS2 was subjected to a single 
MAGE cycle to modify genes at 20 locations dispersed 
throughout the genome. Each separate electroporation 
was performed as a singleplex experiment, i.e. dominated 
by the presence of one oligo with the intent to modify one 
site. Trace amounts of three other oligos were included for 
restoring function at selectable loci cat', kan and bla~. 
Resultant cultures were grown under selective and 
non-selective conditions, and the extent of modification 
at each target site was assessed by AF-qPCR. The 
impact of CoS for these functions was calculated as a 
CoS factor F = ii(selection)/ii(no selection). 

Observing CoS synergies 

AR frequency was measured for one MAGE cycle at a single 
chromosomal site pphB flanked by a pair of deactivated 
selectable marker genes, tolC and cat' . Contexts ranged 
from singleplex (1 oligo, mole fraction = 1) to highly multi- 
plexed (24 different oligos, i.e. each at mole fraction = 0.04). 
The groupings of sites and their targeted mutations are 
indicated in Supplementary Table S3. Trace amounts of 
CoS oligos (1% of total) were included for restoring 
selectable function. Resulting cultures were grown under 
conditions selecting for re-activated proximal marker {cat, 
<5 kbp), distal marker (tolC, >300 kbp), both and neither. 

Multicycle CoS MAGE with alternating selections 

Performing multiple cycles of CoS MAGE in series requires 
either many markers that can each be selected once or a 
single marker that can alternately be selected in the on and 
off states. For addressing a set of 80 sites in the E. coli 
chromosome, we used many cycles of CoS MAGE using 
the dually selectable tolC gene. When the CoS oligo used 



was tolC_on, the post-electroporation culture was allowed 
to recover at 30°C with agitation for 1 h before plating 80 \i\ 
on LB agar with carbenicillin and 0.005% sodium dodecyl 
sulphate (SDS) to isolate tolC^ colonies. When the CoS 
oligo used was tolC_off, the post-electroporation culture 
was allowed to recover and grow at 30° C with agitation 
for no less than 5 h. This culture was then allowed to grow 
to mid-log and OD600 of 0.4-0.6. At the same time, a 
known tolC^ culture was brought to the same state of 
growth to serve as a negative control. Each of these was 
used to inoculate a tube of 2 ml LB and 20 til colicin El 
preparation with 20^1 cell growth. These cultures were 
allowed to grow for 8-12 h. Each was then plated (typically 
100 liI of a 10~ 5 dilution of mid-log culture or a 10~ 6 
dilution of confluent culture) onto LB agar with carbeni- 
cillin to isolate colonies for screening. Strain EcBS5 was 
used as the starting strain for this experiment. This strain 
has the endogenous tolC gene seamlessly deleted and 
re-inserted within the region to be modified (40 targets on 
either side in the chromosome). 



RESULTS 

To efficiently modify genomes, we introduce several 
distinct oligos into the cells simultaneously, which integrate 
into actively replicating chromosomes at high frequency 
(10). For any given site, the specific oligo anneals to the 
lagging strand of the DNA replication fork, resolving into 
one of the daughter cells (Figure 1) (17). Although each site 
is modified by incorporating a unique oligo, we hypo- 
thesized that oligos targeting multiple sites in close 
proximity should integrate into the same newly synthesized 
strand of the chromosome. When a daughter cell contain- 
ing one such modification is isolated by selection, we ex- 
pect that this cell would be highly enriched for other 
modifications at nearby sites in a co-operative manner, a 
process we refer to as CoS (Figure 1). With this strategy, 
selectable genes can be used as CoS markers in various 
combinations across different regions of the genome to 
enhance MAGE through CoS. These markers can be 
pre-existing in the genome or inserted into the chromosome 




Figure 1. Mechanism for MAGE AR with CoS. The dividing chromosome is schematized, with integration of a mutagenic oligo into the genome at 
a replication fork [adapted from Costantino and Court (4)]. An oligo electroporated into the cell is bound by multiple copies of the X -bacteriophage 
P protein and anneals to the lagging strand during DNA replication. When multiple oligos are incorporated into nearby sites (black and gray 
rectangles), they are predicted to co-segregate at high frequency, often inherited by the same daughter cell. Thus, a permissive replication fork seems 
to be a limiting factor in MAGE. Using one of these modifications to change the function of a selectable gene allows selection to remove 
unmodified cells. 
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by double-stranded DNA recombineering (8) and switched 
on or off by oligo-mediated AR. 

To characterize the effect of CoS on E. coli, we first 
measured single AR frequencies at 20 sites spaced around 
the 4.6 Mb chromosome (Figure 2A). Each of the oligos 
used generated a single-basepair silent mutation (stop 
codon TAG to TAA). These sites were drawn from a 
larger group of 314 such targeted codon replacements 
reported previously (11). In addition, three CoS marker 
sites were chosen (two in close proximity to each other and 
one on the opposite replichore). These markers are 
inactivated bla (ampicillin/carbenicillin), kan (kanamycin) 
and cat (chloramphenicol) resistance genes on the chromo- 
some, each encoding a reversible nonsense mutation. We 
used CoS oligos (bla_on, kan_on and cat_on) that can 
restore the selectable phenotype to these markers. For 
each of the 20 sites, a 'singleplex' MAGE experiment 
targeted the modification of one TAG site, while also 
including more dilute amounts of CoS oligos for CoS. 
Thus, each electroporation introduced one specific targeting 
oligo (5uM concentration) plus the three CoS oligos 
(0.05 uM each). We then measured the resulting AR 
frequencies with and without CoS. 

CoS yielded notable enhancement of AR frequencies, 
quantified as a CoS factor — the ratio of AR frequencies 
with and without selection (Figure 2A). Consistent with 
our hypothesis, we observed that CoS enhanced AR freque- 
ncies especially at sites within ~500kb of the selected CoS 
markers on the same replichore. (We note that the rightmost 
data point in Figure 2A appears to show a strong cat CoS 
enhancement at a much greater distance. However, one of 
the two unselected AR frequencies was quite high at this 
locus, a potential outlier giving rise to the large error bar.) 
Moreover, greater enhancements were observed at sites in 
phase with the direction of the replication fork, i.e. 'down- 
stream' of the CoS markers (farther from the origin of 
replication). 



To further investigate these proximity effects, we assessed 
the effect of CoS with multiplex MAGE experiments using 
24 target sites flanked by two CoS markers. These sites 
spanned a 320 kb region in the same replichore of the chro- 
mosome, flanked by inactivated CoS markers (cat' and 
tolC). We characterized the AR frequencies at multiple 
sites, with and without CoS. Figure 2B shows AR 
frequencies for one representative site (pphB) within 5kb 
of the cat' marker and more than 300 kb from the tolC 
marker. CoS for conversion of the (cat') marker proximal 
to pphB produced a greater enhancement than CoS for con- 
version of the (tolC) marker distal to pphB. CoS for both 
flanking markers produced the greatest enhancement. These 
effects are observed for both singleplex conditions (one site 
modified, i.e. using one oligo plus CoS oligos) but are 
greatest under highly multiplexed conditions (up to 24 
distinct oligos plus CoS oligos, see also Supplementary 
Figure SI and Supplementary Table S3). 

To explore the extent to which CoS can enhance 
multiplexed genome engineering in greater detail, we 
quantified the AR frequencies for 37 sites throughout the 
chromosome. These sites were divided into four subsets 
(A-D, Supplementary Table S2) in varying positions 
relative to two CoS markers, cat and kan, located on 
opposite replichores (Figure 3 A). In Replichore 1, Group 
A sites are clustered in close proximity to the kan marker, 
and Group B sites are more dispersed. In Replichore 2, 
Group C sites are clustered in close proximity to the cat' 
marker, and Group D sites are more dispersed. AR 
frequencies for these targets were evaluated in up to 
10-plex MASC-PCR reactions (Supplementary Methods). 
This allowed us to both measure AR frequencies at individ- 
ual sites (Figure 3B) and to assess the distributions of 
modifications in the resulting clones (Figure 3C). The 
frequencies of AR across the 37 target sites were measured 
by screening up to 48 isolated clones under each CoS condi- 
tion (none, kanamycin, chloramphenicol or both). The 
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Figure 2. (A) 20 sites were separately modified throughout the E. coli genome in singleplex MAGE experiments in the presence of CoS oligos that 
re-activate the genes cat~, kan~ and blcr. Resultant cultures were grown under selective and non-selective conditions, and the AR frequencies at each 
site were assessed by AF-qPCR. A CoS factor (the ratio of AR frequencies with and without CoS) is shown — factors greater than one indicate 
increased frequencies due to CoS. (B) AR frequency was measured for one MAGE cycle at a single chromosomal site (pphB) flanked by a pair of 
deactivated CoS markers. Contexts ranged from singleplex (one oligo, 100%) to highly multiplexed (24 oligos, each ~4% of the pool) not including 
the CoS oligos. Resulting cultures were grown under conditions selecting for either re-activated proximal marker (cat~, <5kb), distal marker (tolC~, 
>300kb), both and neither. 
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co-selection antibiotic used (N = none, C = cat, K = kan) 
C K C&K N C K C&K N C K C&K N C K C&K 
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group D 



# of additional 
loci converted 
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■ 5 
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3 
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Figure 3. Co-selection enhances multiplex AR. (A) Diagram of four groups of targets A-D based on proximity to selectable loci kan (A, B) and cat 
(C, D). Each group contained 8-10 targets, not counting CoS markers (Supplementary Table 2). Groups A (eight targets) and C (9 targets) are within 
56 kb from kan and cat, respectively, whereas Groups B and D (10 targets each) are within 694 kb from kan and cat, respectively. (B) AR frequency 
of Groups A-D mapped to distance to nearest selected locus, kan or cat, and the increased frequency for each site when CoS was applied. 
(C) Frequency distribution of clones in the population with different numbers of additional conversions in each of four Groups A-D, under no 
selection, cat, kan or cat/kan CoS. Same-replichore CoS is indicated in red lettering. 



average AR frequencies for these experiments are given in 
Supplementary Table S 1 . When cells were not co-selected for 
restoration of either kan or cat marker, the AR frequency in 
the multiplexed reaction was low, averaging 3.7% per site. 
Targets in close proximity to the co-selected markers on the 
same replichore (Group A/kan and Group C/cat, <56kb) 
showed the greatest frequency improvements under CoS — 
giving average CoS factors from 3.3- to 5.5-fold. In contrast, 
CoS at the opposite replichore (Group A/cat and Group 
C/kan, > 1.4 Mb) yielded a modest improvement — CoS 
factors of 1.3- to 1.6-fold. When frequencies are plotted 
against the distance to the nearest CoS markers on the 
same replichore (Figure 3B), we find the greatest improve- 
ments clustering near these markers. 

To evaluate the synergistic effects among sites under 
CoS, we further analyzed the distribution of conversions 
accumulated for individual clones in Groups A-D 
(Figure 3C, Supplementary Figure S4). Without CoS, 
most of the population (~70%) remained unmodified. 



Cross-replichore CoS showed marginal increases in the fre- 
quency at which mutants were found. In contrast, 
same-replichore CoS dramatically increased the frequency 
of mutants with large numbers of modifications. Double 
CoS of one marker on each replichore further increased 
multiplexed AR frequencies, giving rise to conversions in 
70% of the cell population. These populations contained 
individuals with as many as 8 of 10 targeted conversions 
(Group A: six co-selected sites verified by MASC-PCR, 
plus two marker sites by lean/cat double selection). These 
results showed that a single cycle of MAGE can operate in a 
single cell at 8 or more spatially distinct loci. 

We posit that CoS in general isolates cells that have 
taken up more oligos, giving rise to the modest increases 
in AR frequencies during cross-replichore CoS. Moreover, 
proximity-based CoS (within ~500kb) especially increases 
the likelihood of isolating cells which had chromosomes at 
an optimal stage of replication for obtaining correlated 
AR events. This effect is notable in our ability to easily 
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isolate highly modified cells. Without CoS, the average 
AR frequency across all sites of Groups A-D was 
3.7(±3.4)% per target. With double CoS of the kan and 
cat markers, the average was 15.6(±9.4)%, a 4-fold 
improvement (Supplementary Figure S4). If these 
co-selected frequencies were independent of each other, 
the population of modified clones would be described by 
a binomial distribution. For AR frequencies of 3.7%, only 
one colony in 1.5 x 10 7 would contain six or more modi- 
fications of eight (excluding the CoS markers), and we 
would need to screen at least 4.5 x 10 7 clones for a 95% 
likelihood of obtaining one (Figure 4; see later for detailed 
calculations). Frequencies of 15.6%, would require 10 4 
colonies to meet the same goal, a >4000-fold decrease in 
the scale of colony screening. However, in the earlier 
experiment, the 6-conversion mutant was found by 
screening only 48 colonies with CoS. 

We have consistently achieved this level of performance 
in CoS experiments using from 6 to 24 different oligos, 
yielding clones with 5-8 modifications (Figure 5) per 
round of CoS-MAGE. We attribute our increased ability 
to enrich for highly modified cells to co-operative effects 
of CoS, isolating groups of sites that are converted 
together. Using simple MASC-PCR or multiplex 
allele-specific colony qPCR (MASC-qPCR) methods, we 
can readily screen 100 or more colonies for conversion at 
up to 12 target sites simultaneously with a turnaround 
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Figure 4. Colony screening requirement to isolate clones with the 
highest number of conversions based on a binomial model. AR 
process is evaluated as a binomial process with each site being inde- 
pendently converted. The solid lines correspond to relationship between 
the number of clones needed to be screened from the population after 
AR with oligos targeting eight sites and the highest number of conver- 
sions that are expected to be found in one clone among the clones 
screened. Each line corresponds to a different average per site AR fre- 
quency. AR frequency of 3.7% and 15.6% are empirically determined 
values for 37 sites with and without CoS, respectively. At a 95% con- 
fidence interval, the number of colonies needed to find a 6-conversion 
clone (indicated by the vertical gray dotted line) is shown as solid 
circles. If 100 colonies are screened, one expects to find 2- and 
4-conversion clones for 3.7% and 15.4% AR efficiency, respectively 
(solid squares and horizontal gray-dotted line). To find a 6-conversion 
clone by screening 100 colonies, we expect to need an average AR 
frequency per site of 35% (solid triangle) assuming a binomial 
process. In our experience, CoS yields a population of highly 
modified clones that consistently outperforms expectations based on 
independent AR frequencies. 



time of h. Without CoS, one either needs to screen 
or genotype an impractical number of clones, or expect 
to perform many more cycles of MAGE (11,15). 

Only a low concentration of the CoS oligos (relative to 
total oligo concentration) is necessary to achieve CoS 
enhancement (Supplementary Figure S2). A low CoS 
oligo-to-total oligo ratio minimizes competition for 
entry to the cell between the CoS oligo and the rest of 
the oligo pool. However, a lower fraction of CoS oligo 
leads to a smaller population of surviving cells as a result 
of selection (i.e. fewer cells recombine a molecule of the 
CoS oligo, so fewer cells survive selection). This bottle- 
neck reduces the size of the population and diminishes 
the diversity accessed using MAGE. Thus, for more 
extreme dilutions of CoS oligos (0.01% of total or less, 
or applying CoS at multiple CoS markers simultan- 
eously), we have observed co-selected populations 
dominated by small numbers of genotypes. A smaller 
surviving population also produces a longer delay, as a 
selected culture grows back to the required cell density 
for the next MAGE cycle. We found that diluting the 
fraction of CoS oligo to 0.1-1% of the total oligo pool 
led most consistently to the greatest CoS enhancement 
(Supplementary Figures S2, S3 and S4), without overly 
restricting the cell population. 

To illustrate how one might perform cycles of 
CoS-MAGE in series, we used this strategy to recode 
80 sites across one-fourth of the E. coli genome, 
spanning 1.1 megabase pairs (Figure 5). A single dually 
selectable tolC marker (11,18), inserted into the center of 
this region, was repeatedly re-used for this purpose in 
strain EcBS5. Odd cycles of co-selected MAGE 
switched the tolC gene off by removing the start 
codon, whereas even cycles reversed this change to 
switch tolC on. At each cycle, up to 191 colonies were 
screened by MASC-qPCR to identify highly modified 
clones. Groups of 10 sites as identified previously (11) 
were targeted for modification with each pair of cycles. 
Odd cycles included oligos to modify all 10 sites within a 
group, whereas even cycles were used to finish off any 
unmodified sites (from 2 to 6) of that group. Any sites 
still left unmodified were then carried over, i.e. included 
as targets for the following odd cycle. Thus for some 
cycles, 11 or 12 conversions were attempted simultan- 
eously (not including the CoS marker conversion). A 
total of 18 cycles of CoS-MAGE were used (Figure 5). 
All 80 sites were modified over the course of this experi- 
ment, although one site (yegV, modified at cycle 2) was 
modified back to WT inadvertently, being overwritten by 
an overlapping oligo acting on another nearby site 
(yegW, at cycle 8). Re-conversion of yegV to a TAA 
stop codon was attempted again once (Figure 5, open 
square at cycle 18) but was unsuccessful. 

Cycles attempting the most modifications also yielded 
the most modifications (Figure 5). For example, when 
10-12 modifications were attempted, the results ranged 
from 5 to 8 sites modified (with a mode of seven sites). 
This outcome suggests that the most efficient approach 
to modifying a large group of sites would be to 
maximize the number of sites addressed at every cycle 
of the process. However, in experiments addressing as 
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Figure 5. Serial CoS-MAGE to address 80 genome sites. Repeated cycles of MAGE used CoS at a dually selectable tolC locus to reformat all 
amber (TAG) stop codons spanning one-fourth of the E. coli MG1655 genome. Odd-numbered cycles switched tolC off and selected for the 
absence of the gene product using colicin El. Even-numbered cycles switched the tolC to an active state and selected for the presence of the 
protein with SDS. After each cycle colonies were screened by MASC-qPCR to identify clones containing the largest number of successful edits. 
One cycle (15) performed poorly (1 conversion of 12 attempted) due to arcing during electroporation. Left: Progressive editing of the genome is 
shown for each cycle, with sites successfully modified shown with solid squares and those unmodified shown with open squares. Site locations 
are indicated relative to the inserted tolC CoS marker. Right: Numbers of conversions at each cycle. The greatest number of conversions was 
obtained for cycles attempting 10 or more (versus 6 or fewer). These yielded 5-8 conversions per cycle, with a median (and mode) of 7, not 
counting the conversion of the CoS marker. 



many as 20 sites (data not shown), we have not yet 
observed more conversions than when addressing 12. 
Strategies that allow the screening of many more 
clones at many more sites are also likely to yield 
more highly modified clones. For screening by 
MASC-PCR or MASC-qPCR, multiplexing detection 
up to 12 sites per PCR reaction well has been imple- 
mented. Performing these techniques more intensively 
would permit screening a few thousand clones per day 
(e.g. 2-hour qPCR runs, 192 clones/run). If screening 
limits were abolished, we anticipate even more 
modified clones would be obtained. Other factors that 
likely affect the reach of CoS-MAGE are the number of 
oligo molecules entering the cell, the kinetics of oligo 
survival in the cell and the rate of oligo incorporation 
into the chromosome. Experiments to address these 
factors are beginning to provide further improvement, 
such as by limiting the action of endogenous nucleases 
(J. A. Mosberg, C.J. Gregg, et a/., in preparation). 

In the earlier experiment, significant time and effort was 
required to produce and screen colonies at each cycle. 
Although a cycle of MAGE normally can be performed 
in 3 h or less as part of a larger automation process (10), 
including CoS requires at least a similar amount of 
additional time for growth under selective conditions. 
Including screening at every cycle in the earlier example 
then required plating cells to grow colonies, followed by 



colony picking and PCR-based colony screening, extend- 
ing cycle times to at least 2 days. The 80-site experiment 
was paused at multiple intervals as convenient — thus ~40- 
day process was executed in approximately twice that 
amount of time. The choice of frequent screening had 
the benefit of yielding the most highly modified clones to 
pass to the next cycle. However, other applications of 
CoS-MAGE will likely find it most expedient to forgo 
frequent screening and simply take advantage of the 
strong enhancement provided by CoS, which can be 
4-fold at each site (Supplementary Figures S2 and S3). 
The latter approach would be especially preferred in 
applications where one wishes to maximize the degree of 
diversity generated in a modified population, such as for 
optimizing a metabolic pathway (10) or tuning a genetic 
circuit (12). 

We developed a mathematical model to anticipate the 
screening requirement when using MAGE to isolate 
highly modified clones, both with and without CoS. 
Modeling the multiplex AR as a simple binomial distribu- 
tion with conversion events at each target site occurring 
independently, we first assume an average AR frequency 
per site of p for a group of k sites. The abundance of 
clones with n mutations in the population after a single 
MAGE cycle is/= (k choose n) p "(1 -pf ~ ". To isolate 
this ^-mutation clone at a 95% likelihood needs to satisfy 
the condition (1 —f) s < 0.05. When solving for s, we get 



Page 9 of 1 1 



Nucleic Acids Research, 2012, Vol. 40, No. 17 el 32 



s > log(0.05)/log(l — f), which is the number of colonies to 
be isolated and screened to find at least one clone with n or 
more mutations at 95% likelihood (Figure 4). For k = 8 
and n = 6 at p = 0.037 (the AR frequency of 3.7% 
without CoS observed above, Figure 3 and 
Supplementary Table SI), we calculate s = 44 965 770. 
At /? = 0.156 (AR frequency of 15.6% observed with 
CoS in the same experiment), we find that s = 10420. 
This calculation indicates that CoS-MAGE could reduce 
screening requirements by >4000-fold under such 
conditions. Furthermore, with CoS-MAGE, we observe 
such highly modified clones at much higher frequencies 
than this (e.g. 1 in 50 instead of 1 in 10420 predicted). 
Consistent with our physical model (Figure 1), these 
results indicate that oligo incorporations into the same 
region of the chromosome can be highly correlated and 
not completely independent. 

Examined another way, as we find a 6-conversion 
clone in our multiplex CoS experiment of eight target 
sites at an abundance of 2% (f= 0.02), this would only 
be predicted by the above model if p = 0.35 or 35% 
(twice the observed value). At this AR frequency, we 
need to screen 117 clones to confidently find a clone 
with six conversions or more, which is on par with our 
experimental findings. This analysis underscores two im- 
portant points. First, a 4-fold improvement in AR fre- 
quency per site translates to a dramatically decreased 
screening need. Second, as the binomial model predicts 
AR frequencies of 35% required to yield these 
6-conversion clones (in sharp contrast to the 15.6% fre- 
quency observed), the highly co-operative CoS process 
does not seem to follow a simple binomial distribution 
of independent events. 

Another feature of CoS-MAGE is that it minimizes the 
number of growth cycles cells must spend in a mutator 
state. Performing MAGE at high efficiency typically has 
relied on cells deficient in mismatch repair, e.g. mutS~ (4). 
(Otherwise, the cell's mismatch repair pathway attempts 
to 'repair' the genome edits that are being attempted.) 
However, performing MAGE this way also increases the 
rate of accumulation of background mutations in the 
genome (11). One strategy to avoid this limitation is to 
leave the repair pathway intact and instead use oligo 
sequences that create mismatches poorly recognized by 
mismatch repair, such as CC mismatches (4), multiple 
mismatches (17,19) and mismatches produced with 
chemically modified bases (19). However, these 
approaches place some sequence limitations on which 
genome edits can be made. 

When a genome engineering application requires 
shutting off mismatch repair, the amount of cell growth 
in the mutator state should be minimized. CoS-MAGE 
provides a benefit in this regard, requiring far fewer cycles 
(and cell divisions) to reach a given objective. In addition, 
we explored the possibility of turning mismatch repair off 
temporarily at the beginning of MAGE cycling, so that it 
could be turned back on when finished (Supplementary 
Figure S5A). A mutS_off oligo was designed to edit the 
ATG start codon of the mutS gene to ATC by creating a 
CC mismatch poorly recognized by the mutS protein, 
turning the gene off but not deleting it as in previous 



studies (4). The mutS_off oligo was applied during the 
first cycle of a MAGE experiment where tolC on/off 
switching was also used as described earlier. The popula- 
tion fraction for the ATC (mutS_off) allele was measured 
for four cycles of co-selected MAGE. We anticipated that 
the mutS_off population would increase with each cycle, 
even though the mutS_off oligo was only applied in the 
first cycle: CoS-MAGE selects at each cycle for cells that 
have successfully taken up oligos and incorporated them 
into their genomes, and only mutS_off cells should do 
this (and survive) at high efficiency. We observed that the 
mutS_off cells became dominant in the population after 
only a few cycles of CoS-MAGE (Supplementary 
Figure S5B). 

We have observed the effects of CoS acting far from the 
site of a given CoS marker and on both sides of the marker 
(Figures 2A, 3B and 5). Nevertheless, an asymmetry was 
observed, most prominently in Figure 2A (a singleplex 
experiment) with CoS effectiveness dropping off sharply 
for sites closer to the origin than the marker. Figures 3B 
and 5 (multiplex experiments) may also indicate a measure 
of this asymmetry, but if so, these effects are more modest. 
The reason for this asymmetry — and why it might display 
most prominently for singleplex experiments — is unclear. 
Part of the explanation may lie in the nature of the 
replicating arms of the chromosome, as the copy 
number of genes (and thus numbers of targets for oligo 
annealing) upstream of the marker will generally be 
higher. 



DISCUSSION 

Currently, genomes can be engineered by different 
complementary approaches including complete de novo 
synthesis (20) and editing techniques such as MAGE 
(10). De novo synthesis offers the ability to create new 
genomes without a physical template but is limited by 
the difficulty and cost of in vitro DNA assembly, by the 
technical challenges of 'booting' a synthetic genome and 
by the biological challenges of designing a highly 
modified genome that will still support life. In 
contrast, MAGE relies on the manipulation of an 
existing genomic template in vivo to produce newly en- 
gineered variants without the need for total re-synthesis. 
Such template-mediated genome engineering is espe- 
cially attractive in cases where the new constructs 
share strong sequence similarity (>90%) with existing 
constructs. As our approach modifies an existing 
genome by living intermediates, MAGE facilitates effi- 
cient incorporation of specified mutations and real-time 
viability testing to identify and avoid any lethal muta- 
tions. In contrast, de novo synthesis uses an 
all-or-nothing synthesis and boot approach that does 
not lend itself to easy troubleshooting. Furthermore, 
template-based engineering can benefit from natural se- 
lection processes as new genomes progress by directed 
steps from existing functional genomes. 

We have previously reported the development of 
hardware for efficient automation of MAGE processes 
(10). The selection steps of CoS MAGE can be easily 
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incorporated into the cycles performed by this system for 
obtaining the optimum combinations of modification, 
growth and selection. The CoS enhancement increases 
the ability of MAGE to make very large numbers of 
changes to a genome, especially when combined with 
other tools such as conjugative assembly genome engin- 
eering (CAGE) (11). As we previously reported using 
MAGE and CAGE together to make 314 genome edits 
in E. coli, CoS MAGE (with CAGE) provides the possi- 
bility of extending this reach to thousands of sites. For 
projects on a smaller scale, CoS can be used for a more 
modest number of cycles performed manually (without 
automation hardware) to modify dozens of sites (12). 

CoS strategies dramatically increase our genome 
engineering capabilities. We have demonstrated in 
several experiments that CoS MAGE yields higher AR 
frequencies, often improving by a factor of 4. This en- 
hancement is especially useful when making many modi- 
fications to a genome. For example, our recent report 
altering stop codons in E. coli (11) required 18 cycles of 
MAGE to produce an average of 8 modifications of 10 
targeted sites. In contrast, with CoS-MAGE we now 
easily isolate cells that incorporate an average of seven 
modifications (plus one or two modified CoS marker 
genes) after only a single cycle. Including CoS marker 
conversions, we have demonstrated at least nine simul- 
taneous modifications to the genome are possible. With 
further tuning or screening more clones, a greater number 
seems plausible. These increased frequencies are obtained 
using easily switchable genetic markers to co-select for 
several correlated AR events, targeting multiple chromo- 
somal sites spanning as much as a megabase pair of a 
genome (up to 500 kb from the selection gene in either 
direction). Cells containing many such chromosomal 
modifications can be isolated efficiently by screening 
and can then be used for subsequent CoS-MAGE 
cycles. Markers with both positive and negative selection 
options (e.g. tolC, galK and thy A) are readily available 
for this purpose. Deploying these markers throughout the 
genome will generate programmable zones that are 
hyper-responsive to genome engineering. With the 
effects of CoS-MAGE spanning up to 1 megabase pair 
per marker, only a modest number of markers may be 
needed to fully address microbial genomes such as E. coli 
MG1655 (4.6Mb). 
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